Overview

Dataset Statistics

Number of Variables 24
Number of Rows 86520
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 53.6 MB
Average Row Size in Memory 649.3 B
Variable Types
  • GeoGraphy: 1
  • Categorical: 14
  • Numerical: 8
  • DateTime: 1

Dataset Insights

ratio is skewed Skewed
rgb_g_main_col is skewed Skewed
article_id_1 has a high cardinality: 476 distinct values High Cardinality
article_id_1 has constant length 6 Constant Length
promo_media_ads has constant length 1 Constant Length
promo_store_event has constant length 1 Constant Length
article_id_2 has constant length 6 Constant Length
rgb_r_sec_col has constant length 3 Constant Length
rgb_g_sec_col has constant length 3 Constant Length
rgb_b_sec_col has constant length 3 Constant Length
label has constant length 1 Constant Length
rgb_b_main_col has 8652 (10.0%) zeros Zeros
  • 1
  • 2

Variables


country

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6216740

Length

Mean 6.8532
Standard Deviation 0.3539
Median 7
Minimum 6
Maximum 7

Sample

1st row Austria
2nd row Austria
3rd row Austria
4th row Austria
5th row Austria

Letter

Count 592940
Lowercase Letter 506420
Space Separator 0
Uppercase Letter 86520
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Germany, Austria) take over 50.0%

article_id_1

categorical

Approximate Distinct Count 476
Approximate Unique (%) 0.6%
Missing 0
Missing (%) 0.0%
Memory Size 6142920

Length

Mean 6
Standard Deviation 0
Median 6
Minimum 6
Maximum 6

Sample

1st row IR3275
2nd row IR3275
3rd row IR3275
4th row IR3275
5th row IR3275

Letter

Count 173040
Lowercase Letter 0
Space Separator 0
Uppercase Letter 173040
Dash Punctuation 0
Decimal Number 346080
  • article_id_1 has words of constant length

sales

numerical

Approximate Distinct Count 145
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1384320
Mean 34.7428
Minimum 1
Maximum 145
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • sales is skewed right (γ1 = 1.3012)

Quantile Statistics

Minimum 1
5-th Percentile 2
Q1 9
Median 23
Q3 50
95-th Percentile 110
Maximum 145
Range 144
IQR 41

Descriptive Statistics

Mean 34.7428
Standard Deviation 33.5105
Variance 1122.9537
Sum 3.0059e+06
Skewness 1.3012
Kurtosis 0.9554
Coefficient of Variation 0.9645
  • sales is not normally distributed (p-value 1.4674832343595616e-06)
  • sales has 4000 outliers

regular_price

numerical

Approximate Distinct Count 121
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1384320
Mean 49.0723
Minimum 3.95
Maximum 153.95
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • regular_price is skewed right (γ1 = 0.7915)

Quantile Statistics

Minimum 3.95
5-th Percentile 6.95
Q1 25.95
Median 39.95
Q3 69.95
95-th Percentile 107.95
Maximum 153.95
Range 150
IQR 44

Descriptive Statistics

Mean 49.0723
Standard Deviation 31.9631
Variance 1021.6413
Sum 4.2457e+06
Skewness 0.7915
Kurtosis -0.139
Coefficient of Variation 0.6513
  • regular_price is not normally distributed (p-value 0.00045816676136702274)
  • regular_price has 660 outliers

current_price

numerical

Approximate Distinct Count 78
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1384320
Mean 25.8548
Minimum 1.95
Maximum 78.95
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • current_price is skewed right (γ1 = 0.956)

Quantile Statistics

Minimum 1.95
5-th Percentile 3.95
Q1 11.95
Median 20.95
Q3 35.95
95-th Percentile 63.95
Maximum 78.95
Range 77
IQR 24

Descriptive Statistics

Mean 25.8548
Standard Deviation 18.1109
Variance 328.0058
Sum 2.237e+06
Skewness 0.956
Kurtosis 0.1583
Coefficient of Variation 0.7005
  • current_price has 1580 outliers

ratio

numerical

Approximate Distinct Count 2336
Approximate Unique (%) 2.7%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1384320
Mean 0.5467
Minimum 0.2965
Maximum 1
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • ratio is skewed right (γ1 = 0.3843)

Quantile Statistics

Minimum 0.2965
5-th Percentile 0.3029
Q1 0.3646
Median 0.5287
Q3 0.6973
95-th Percentile 0.8808
Maximum 1
Range 0.7035
IQR 0.3327

Descriptive Statistics

Mean 0.5467
Standard Deviation 0.1909
Variance 0.03643
Sum 47300.7553
Skewness 0.3843
Kurtosis -0.8805
Coefficient of Variation 0.3491
  • ratio is not normally distributed (p-value 2.2321285957391753e-18)

retailweek

datetime

Distinct Count 123.1156
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 692288
Minimum 2014-12-28 00:00:00
Maximum 2017-04-30 00:00:00

promo_media_ads

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 5710320
  • The largest value (0) is over 17.69 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 86520
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 17.69 times larger than the second largest value (1)
  • promo_media_ads has words of constant length

promo_store_event

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 5710320
  • The largest value (0) is over 215.3 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 86520
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 215.3 times larger than the second largest value (1)
  • promo_store_event has words of constant length

customer_id

numerical

Approximate Distinct Count 4296
Approximate Unique (%) 5.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1384320
Mean 2707.6407
Minimum 1
Maximum 5999
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • customer_id is skewed right (γ1 = 0.2576)

Quantile Statistics

Minimum 1
5-th Percentile 196
Q1 1008.75
Median 1994.5
Q3 4569.25
95-th Percentile 5722
Maximum 5999
Range 5998
IQR 3560.5

Descriptive Statistics

Mean 2707.6407
Standard Deviation 1913.5208
Variance 3.6616e+06
Sum 2.3427e+08
Skewness 0.2576
Kurtosis -1.4354
Coefficient of Variation 0.7067
  • customer_id is not normally distributed (p-value 2.2931323703405887e-07)

article_id_2

categorical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6142920

Length

Mean 6
Standard Deviation 0
Median 6
Minimum 6
Maximum 6

Sample

1st row OC6355
2nd row AP5568
3rd row CB8861
4th row LI3529
5th row GG8661

Letter

Count 173040
Lowercase Letter 0
Space Separator 0
Uppercase Letter 173040
Dash Punctuation 0
Decimal Number 346080
  • article_id_2 has words of constant length

productgroup

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6376524
  • The largest value (SHOES) is over 3.0 times larger than the second largest value (HARDWARE ACCESSORIES)

Length

Mean 8.7
Standard Deviation 5.917
Median 5
Minimum 5
Maximum 20

Sample

1st row SHOES
2nd row SHORTS
3rd row HARDWARE ACCESSORI...
4th row SHOES
5th row SHOES

Letter

Count 735420
Lowercase Letter 0
Space Separator 17304
Uppercase Letter 735420
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (SHOES, HARDWARE ACCESSORIES) take over 50.0%
  • The largest value (shoes) is over 3.0 times larger than the second largest value (accessories)

category

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6419784

Length

Mean 9.2
Standard Deviation 3.8936
Median 8
Minimum 4
Maximum 16

Sample

1st row TRAINING
2nd row TRAINING
3rd row GOLF
4th row RUNNING
5th row RELAX CASUAL

Letter

Count 770028
Lowercase Letter 0
Space Separator 25956
Uppercase Letter 770028
Dash Punctuation 0
Decimal Number 0

cost_article_2

numerical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1384320
Mean 6.517
Minimum 1.29
Maximum 13.29
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • cost_article_2 is skewed right (γ1 = 0.0994)

Quantile Statistics

Minimum 1.29
5-th Percentile 1.29
Q1 2.29
Median 6.95
Q3 9.6
95-th Percentile 13.29
Maximum 13.29
Range 12
IQR 7.31

Descriptive Statistics

Mean 6.517
Standard Deviation 3.9147
Variance 15.3251
Sum 563850.84
Skewness 0.09935
Kurtosis -1.2873
Coefficient of Variation 0.6007
  • cost_article_2 is not normally distributed (p-value 0.00045385139444877845)

style

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6099660
  • The largest value (regular) is over 1.67 times larger than the second largest value (wide)

Length

Mean 5.5
Standard Deviation 1.5
Median 5.5
Minimum 4
Maximum 7

Sample

1st row slim
2nd row regular
3rd row regular
4th row regular
5th row regular

Letter

Count 475860
Lowercase Letter 475860
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (regular, wide) take over 50.0%
  • The largest value (regular) is over 1.67 times larger than the second largest value (wide)

sizes

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 7198464
  • The largest value (xxs,xs,s,m,l,xl,xxl) is over 9.0 times larger than the second largest value (xs,s,m,l,xl)

Length

Mean 18.2
Standard Deviation 2.4
Median 19
Minimum 11
Maximum 19

Sample

1st row xxs,xs,s,m,l,xl,xx...
2nd row xxs,xs,s,m,l,xl,xx...
3rd row xxs,xs,s,m,l,xl,xx...
4th row xxs,xs,s,m,l,xl,xx...
5th row xxs,xs,s,m,l,xl,xx...

Letter

Count 1072848
Lowercase Letter 1072848
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (xxs,xs,s,m,l,xl,xxl, xs,s,m,l,xl) take over 50.0%
  • The largest value (xxsxslxlxxl) is over 9.0 times larger than the second largest value (xslxl)

gender

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6039096
  • The largest value (women) is over 7.0 times larger than the second largest value (kids)

Length

Mean 4.8
Standard Deviation 0.7483
Median 5
Minimum 3
Maximum 6

Sample

1st row women
2nd row women
3rd row women
4th row kids
5th row women

Letter

Count 415296
Lowercase Letter 415296
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (women, kids) take over 50.0%
  • The largest value (women) is over 7.0 times larger than the second largest value (kids)

rgb_r_main_col

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 5874708

Length

Mean 2.9
Standard Deviation 0.3
Median 3
Minimum 2
Maximum 3

Sample

1st row 205
2nd row 188
3rd row 205
4th row 205
5th row 138

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 250908

rgb_g_main_col

numerical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1384320
Mean 139.6
Minimum 26
Maximum 238
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • rgb_g_main_col is skewed left (γ1 = -0.4105)

Quantile Statistics

Minimum 26
5-th Percentile 26
Q1 104
Median 144
Q3 181
95-th Percentile 238
Maximum 238
Range 212
IQR 77

Descriptive Statistics

Mean 139.6
Standard Deviation 63.6419
Variance 4050.2868
Sum 1.2078e+07
Skewness -0.4105
Kurtosis -0.7287
Coefficient of Variation 0.4559
  • rgb_g_main_col is not normally distributed (p-value 3.455445246562142e-08)

rgb_b_main_col

numerical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1384320
Mean 133.5
Minimum 0
Maximum 250
Zeros 8652
Zeros (%) 10.0%
Negatives 0
Negatives (%) 0.0%
  • rgb_b_main_col is skewed left (γ1 = -0.2331)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 57
Median 143
Q3 205
95-th Percentile 250
Maximum 250
Range 250
IQR 148

Descriptive Statistics

Mean 133.5
Standard Deviation 81.1488
Variance 6585.1261
Sum 1.155e+07
Skewness -0.2331
Kurtosis -1.213
Coefficient of Variation 0.6079
  • rgb_b_main_col is not normally distributed (p-value 0.00045385139444877764)

rgb_r_sec_col

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 5883360

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 255
2nd row 255
3rd row 255
4th row 164
5th row 164

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 259560
  • The top 2 categories (205, 164) take over 50.0%
  • rgb_r_sec_col has words of constant length

rgb_g_sec_col

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 5883360

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 187
2nd row 187
3rd row 187
4th row 211
5th row 211

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 259560
  • The top 2 categories (155, 187) take over 50.0%
  • rgb_g_sec_col has words of constant length

rgb_b_sec_col

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 5883360

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 255
2nd row 255
3rd row 255
4th row 238
5th row 238

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 259560
  • The top 2 categories (155, 238) take over 50.0%
  • rgb_b_sec_col has words of constant length

label

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 5710320
  • The largest value (0) is over 6.17 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 1
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 86520
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 6.17 times larger than the second largest value (1)
  • label has words of constant length

Interactions

Correlations

Missing Values